Utterance Clustering Using Stereo Audio Channels

نویسندگان

چکیده

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve performance utterance by multichannel (stereo) signals. Processed signals were generated combining left- right-channel a few different ways then extracting embedded features (also called d-vectors) from those processed applied Gaussian mixture model for supervised clustering. In training phase, parameter-sharing was obtained train each speaker. testing speaker with maximum likelihood selected as detected Results experiments real recordings multiperson discussion sessions showed that proposed method used achieved significantly better than conventional mono-audio more complicated conditions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Music Clustering using Audio Attributes

Abstract—Music brings people together, it allows us to experience the same emotions. Currently musical genre classification is done manually and requires even the trained human ear considerable effort. Therefore, clustering songs automatically and then drawing valuable insights from those clusters is an interesting problem and can add great value to music information retrieval systems. Most of ...

متن کامل

Parametric Coding of Stereo Audio

Parametric-stereo coding is a technique to efficiently code a stereo audio signal as a monaural signal plus a small amount of parametric overhead to describe the stereo image. The stereo properties are analyzed, encoded, and reinstated in a decoder according to spatial psychoacoustical principles. The monaural signal can be encoded using any (conventional) audio coder. Experiments show that the...

متن کامل

Speaker indexing in audio archives using test utterance Gaussian mixture modeling

Speaker Indexing has recently emerged as an important task due to the rapidly growing volume of audio archives. Current filtration techniques still suffer from problems both in accuracy and efficiency. The major reason for the drawbacks of existing solutions is the use of inaccurate anchor models. The contribution of this paper is two-fold. On the theoretical side, a new method is developed for...

متن کامل

High Quality Scalable Stereo Audio Coding

This paper proposes an efficient, low complexity, scalable audio coder based on a combination of two embedded coding algorithms: the SPIHT (set partitioning in hierarchical trees) coding algorithm [1] and an embedded, nested binary set partitioning (NBSP) algorithm. The SPIHT algorithm, considered to be the premier state-of-the-art algorithm in still image compression, is used for the low frequ...

متن کامل

Distributed Virtual Conference Stereo Audio Reconstruction

To enhance the realistic experience of virtual conference, this paper proposed a distributed virtual conference stereo audio reconstruction model. The spatial audio parameters inter-aural level difference (ILD) is used to reconstruct the spatial sound field for each listener. The distributed synthesis system is designed to get a lower network payload.

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational Intelligence and Neuroscience

سال: 2021

ISSN: ['1687-5265', '1687-5273']

DOI: https://doi.org/10.1155/2021/6151651